Method and Apparatus for Scalable Encoding and Method and 
Apparatus for Scalable Decoding 

5 fe - pee - .j : "f - - i:- ee : fe -5:: e - H --Field of the Invention 

The present invention relates to audio and/or video encod- 
ers/decoders and, in particular, to encoder/decoder means 
comprising scalability. 

10 

Background of the Invention and Prior Art 

Up to date audio-encoding methods, such as e.g. MPEG layer 
15 3 (MP3) or MPEG AAC, use transforms, such as for example 

the so-called modified discrete cosine transform (MDCT) , so 
as to obtain a block-wise frequency representation of an 
audio signal. Such an audio-encoder usually obtains a cur- 
rent from time-discrete audio sampled values. The current 
20 from audio sampled values is windowed so as to obtain a 

windowed block of for example 1024 or 2048 windowed audio 
sampled values. For windowing, various window functions are 
used, such as, for example, a sine window, etc. 

25 The windowed time-discrete audio sampled values will then 
be implemented in a spectral representation by means of a 
filter bank. In principle, a Fourier transform or, for spe- 
cial reasons, a variety of said Fourier-transforms, such as 
for example an FFT or, as has been executed, an MDCT may be 

30 used. The block of audio-spectral values at the output of 

the filter bank may then be subjected to further processing 
as required. With the above-specified audio-encoders, a 
quantizing of the audio spectral values follows, with the 
quantizing stages being typically selected such that the 

35 quantizing noise, which is introduced by means of quantiz- 
ing, ranges below the psycho-acoustic masking threshold, 
i.e. is "masked away". Quantizing represents a lossy encod- 
ing. In order to obtain a further data amount reduction, 
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the quantized spectral value will then be subjected to an 
entropy-encoding by means of a Huffman-encoding. By adding 
fha^side information, such as for example scale factors 
etc., a bit stream, which may be stored or transferred, is 
5 formed from the entropy-encoded quantized spectral values 
by means of a bit stream multiplexer. 

In the audio decoder, the bit stream is organized into 
coded quantized spectral values and ^aegeside information by 

10 means of a bit stream demultiplexer. The entropy-encoded 

quantized spectral values are first entropy-encoded, so as 
to obtain the quantized spectral values. The quantized 
spectral values will then be inversely quantized, so as to 
obtain decoded spectral values comprising quantizing noise, 

15 which, however, ranges below the psycho-acoustic masking 
threshold and will therefore not be heard. These spectral 
values will then be implemented in a time representation by 
means of a synthesis filter bank, so as to obtain time- 
discrete decoded audio sampled values. In the synthesis 

20 filter bank a transform algorithm inverse to the transform 
algorithm has to be employed. Moreover, after the fre- 
quency-time retransform, windowing has to be cancelled. 

In order to obtain a good frequency selectivity, up to date 
25 audio-encoder typically use block overlapping. Such a case 
is represented in Fig. 10a. At first, for example 2048 
time-discrete audio sampled values are taken and windowed 
by means of a means 402. The window, which embodies the 
means 402, has a window length of 2N sampled values and 
30 provides a block of 2N windowed sampled values at its out- 
put-side. In order to obtain a window overlapping, a second 
block of 2N windowed sampled values is formed by means of a 
means 404, which, just for the sake of clarity, is sepa- 
rately represented from the means 402 in Fig. 10a. The 2048 
35 sampled values fed into the means 404, however, are not the 
time-discrete audio sampled values to be immediately con- 
nected to the first window, but include the second half of 
the sampled values windowed by the means 402 and addition- 
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ally include only 1024 new sampled values. In Fig. 10a, the 
overlapping is symbolically represented by a means 406, 
which causes a degree of overlapping of 50%. Both the two N 
windowed sampled values output by the means 402 and the 2N 
5 windowed sampled values output by the means 404 will then 
be subjected to the MDCT algorithm by means of a means 408 
and/or 410. The means 408 provides N spectral values in ac- 
cordance with the prior art MDCT algorithm for the first 
window, while the means 410 also provides N spectral val- 
10 ues, however, for the second window, with an overlapping of 
50% existing between the first window and the second win- 
dow. 

In the decoder, the N spectral values of the first window, 
15 as is shown in Fig. 10b, will be fed to a means 412, which 

carries out an inverse modified discrete cosine transform. 

The same applies to the N spectral values of the second 

window. The same will be fed to a means 414, which also 

carries out an inverse modified discrete cosine transform. 
20 Both the means 412 and the means 414 provide 2 N sampled 

values each for the first window and/or 2 N sampled values 

for the second window. 

A means 416, which is referred to as TDAC (TDAC = time do- 
25 main aliasing cancellation) in Fig. 10b, considers the fact 
that the two windows are overlapping. In particular, a sam- 
pled value yi of the second half of the first window, i.e. 
with an index N+k, is summed with a sampled value Y2 from 
the first half of the second window, i.e. with an index k, 
30 such that, at the output-side, i.e. in the decoder, N de- 
coded time sampled values will result. 

It should be appreciated, that by means of the function of 
means 416, which may also be referred to as an add func- 
35 tion, the windowing carried out in the encoder schemati- 
cally represented by Fig. lOa is automatically considered, 
such that in the decoder represented by Fig. 10b, no ex- 
plicit "inverse windowing" has to take place. 
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If the window function implemented by the means 402 or 404 
is designated with w(k), with the index k representing the 
time index, the condition has to be fulfilled that the 
5 squared window weight w(k) added to the squared window 
weight w (N+k) leads to a square of unity, with k ranging 
from 0 to N-1. If a sine window is used, the window weight- 
ings of which follow the first half wave of the sine func- 
tion, this condition is always fulfilled, since the square 
10 of the sine and the square of the cosine always result in 
the value 1 for each angle. 

A disadvantage of the window method described in Fig. 10a 
with a subsequent MDCT function is the fact that the win- 

15 dowing is achieved by a multiplication of time-discrete 

sampled value, and thinking of a sine window, with a float- 
ing-point number, since the sine of an angle between 0 and 
180 degree, apart from the angle of 90 degree, does not re- 
sult in an integer. Even if integer time-discrete sampled 

20 values are windowed, floating-point numbers will result af- 
ter windowing. 

Therefore, even if no psycho-acoustic encoder is used, i.e. 
if no lossless encoding is to be achieved, a quantizing is 
25 necessary at the output of the means 408 and/or 410 so as 
to be able to carry out a reasonably clear entropy-encoded 
process . 

If, therefore, known transforms, as have been operated by 
30 means of Fig. 10a, should by employed for a lossless audio- 
encoding, either a very fine quantizing has to be employed 
in order to be able to neglect the resulting error on the 
basis of the rounding of the floating-point numbers or the 
error signal has to be additionally encoded, for example in 
35 the time domain. 

Concepts of the first kind, that is, concepts in which the 
quantization is so finely tuned that the resulting error is 
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negligible on the basis of the rounding of the floating- 
point numbers, are for example disclosed in the German pat- 
ent application DE 1 97 42 201 CI. Here, an audio signal is 
transferred into its spectral representation and quantized 
5 so as to obtain quantized spectral values. The quantized 
spectral values are again inversely quantized, transferred 
into the time domain, and compared to the original audio 
signal. If the error, meaning the error between the origi- 
nal audio signal and the quantized/inversely quantized au- 

10 dio signal, ranges above an error threshold, the quantizer 
will be more finely tuned in a feedback-like manner, and 
the comparison will then be carried out anew. The iteration 
is finished, when the error falls below the error thresh- 
old. The possibly still existing residual signal will be 

15 encoded with a time domain encoder and written into a bit 
stream, which, in addition to the time domain-encoded re- 
sidual signal, also includes encoded spectral values which 
have been quantized in accordance with the quantizer set- 
tings available at the time of interruption of the itera- 

20 tion. It should be appreciated that the quantizer used does 
not have to be controlled by a psycho-acoustic model, so 
that the encoded spectral values are typically quantized 
more precisely as it should be on the basis of the psycho- 
acoustic model. 

25 

In the technical publication "A Design of Lossy and loss- 
less Scalable Audio Coding", T. Moriya et al, Proc. ICASSP, 
2000, a scalable encoder is described, which comprises, as 
a first lossy data compression module, an MPEG encoder, for 

30 example, which has a block-wise digital wave form as an in- 
put signal and which generates the compressed bit code. In 
a local decoder, which is also present, encoding is elimi- 
nated, and an encoded/decoded signal will be generated. 
This signal will be compared to the original input signal 

35 by subtracting the encoded/decoded signal from the original 
input signal. The error signal will than be fed into a sec- 
ond module, where a lossless bit conversion is used. This 
conversion has two steps. The first step consists in a con- 
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version of a two's complement format into a value sign for- 
mat. The second step consists converting of a vertical mag- 
nitude sequence into a horizontal bit sequence in a proc- 
essing block. The lossless data conversion is carried out 
5 so as to maximize the number of signals or to maximize the 
number of succeeding zeroes in a sequence so as to achieve 
an as good a compression of the time error signals as pos- 
sible, which is available as a result of the digital num- 
bers. This principle is based on a Bit Slice Arithmetic 
10 Coding scheme (BSAC scheme) , which is represented in the 
technical publication "Multi-Layer Bit Sliced Bit Rate 
Scalable Audio Coder", 103. AES convention, pre-print No. 
4520, 1997. 

15 The above-mentioned BSAC publication discloses something 
like an encoder, as is represented in Fig. 8. A time signal 
will be fed into a block 80, which is designated with "Win- 
dows" and time-frequency translation. Typically, use is 
made of an MDCT (MDCT = modified discrete cosine transform) 

20 in block 80. Thereupon, the MDCT spectral value generated 
by the block 80 will be quantized in a block 82 so as to 
obtain quantized spectral values in binary form. The quan- 
tizing by the block 82 will be controlled by a means 84 
calculating a masking threshold using a psycho-acoustic 

25 model, with the quantizing in block 82 being carried out 
such that the quantizing noise remains below the psycho- 
acoustic masking threshold. In block 85, the quantized 
spectral values will then be arranged on a bit-wise basis, 
such that the bits of equal order of the quantized spectral 

30 values are arranged in one column. In block 86, scaling 
layers will then be formed, with one scaling layer corre- 
sponding to a column. A scaling layer therefore comprises 
the bits of equal order of all spectral values quantized. 
Subsequently, each scaling layer will be successively sub- 

35 jected to arithmetic encoding (block 87), while the scaling 
layers output by block 87, in their redundantly encoded 
form, will be fed to a bit-stream formation means, with 
means 88 providing the scaled/encoded signal on its output 
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side, which, apart from the individual scaling layers, will 
also include side information, as is known. 

Generally speaking, the prior state scalable BSAC encoder 
5 will take the highest order bits of all spectral values 
quantized in accordance with psycho-acoustic aspects, sub- 
ject them to arithmetic encoding and then write them into 
the bit stream as a first scaling layer. Typically, since 
very few very large spectral values will be available, very 
10 few quantized spectral values will have a highest order bit 
equal to "1". 

For generating the second scaling layer, the bits of the 
second highest order of all spectral values will be taken, 

15 subjected to arithmetic encoding and then written into the 
bit stream as a second scaling layer. This procedure will 
be repeated as many times until the bits of the least order 
of all quantized spectral values have been arithmetically 
encoded and written into the bit stream as a last scaling 

20 layer. 

Fig. 9 shows a scalable decoder for decoding scaled/decoded 
signals generated by the scalable encoder shown in Fig. 8. 
First, the scalable decoder includes a bit stream deformat- 

25 ting means 90, a scaling layer extraction means/decoding 
means 91, an inverse quantizing means 92 as well as a fre- 
quency domain/time domain translation means 93 so as to ob- 
tain a decoded signal, the quality of which is proportion- 
ally dependent on the number of the number of scaling lay- 

30 ers selected by the means 91. 

In detail, the bit stream deformation means will depack the 
bit stream and will provide the various scaling layers in 
addition to the side information. First, the means 91 will 
35 arithmetically decode and store the first scaling layer. 
Then, the second scaling layer will be arithmetically de- 
coded and stored. This procedure will be repeated as many 
times until either all scaling layers contained in the 
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scaled/encoded signal have been arithmetically decoded and 
stored, or it will be repeated as many times until the num- 
ber of scaling layers requested via a control input 94 have 
been decoded and stored. Thus, the binary patterns for each 
5 individual quantized spectral line will be successively 
generated, with these quantized spectral values, which are 
represented in binary form, being subjected to the inverse 
quantization 92 in consideration of a scale factor etc. so 
as to obtain inversely quantized spectral values which have 
10 to be translated into the time domain by the means 93 so as 
to obtain the decoded signal. 

When decoding, a bit for each spectral value is thus ob- 
tained with each scaling layer. The bits for each spectral 

15 line, which are available after decoding five scaling lay- 
ers, include the uppermost five bits. It should be appreci- 
ated, that in case of very small spectral values, the most 
significant bits of which only come in fifth place, the MSB 
(MSB = most significant bit} of this spectral line will not 

20 be available after decoding five scaling layers, wherein, 
for a more precise representation of this spectral line, 
further scaling layers have to be processed. 

The binary representation of spectral values results in 
25 that - with the MDCT spectral values being for example am- 
plitude values - each additional bit stands for a precision 
gain for the spectral line of 6 db . 

Thus each additional scaling layer will result in an in- 
30 crease in precision of all spectral values by 6 db . 

Considering that at least in noisy signals, the masking 
threshold of hearing ranges only approximately 6 db below 
the signal, it will show that a bit-wise scaling is prob- 
35 lematic in terms of precision, this bit-wise scaling being 
provided by the prior art encoder/decoder concept and being 
used, in particular, for an efficient encoding of the sig- 
nal portions which are just about to be heard, that is, for 
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example, for the lower bits of the spectral values quan- 
tized in accordance with psycho-acoustic aspects. 

If, for example, on the basis of a transmission channel 
5 bottleneck situation, the lowest scaling layer of the 
scaled/encoded signal output by block 88 from Fig. 8, is 
not transmitted, this would result in precision losses of 
6db, which, in an unfavourable constellation, will result 
in clearly audible interferences in the decoded signal. 

10 

Summary of the Invention 

It is the object of the present invention to provide a con- 
15 cept for scalable encoding/decoding, which may be achieved 
by finer scalability. 



30 



i 


a first aspect of the invention, cThis 


object w-ii]:-il:--fo-e 1 s achieved by an apparatus for scalable en- 


coding ±f> a*j-e^'!:4anc 


. > n . 1 d r, 1 i lu 1 ic^ 


. 3 a 




prising binary sdgc 


:tral values, the apparatus comprising: a 


generator for generating a first sub-scaling layer using 


Dits of a certain 


oraer of a first numjoer of iine bmarv 


spectral values ir 


■ a oand, vjitn the first numrier being 


greater or equal tc 


1 „1" and less than a total number of the 


0 1 rj -5 0 . ^1 -a 


_io^ 1 -^1 > ou'^u, una fo gc cu^ i_ a 


s e c o n d s ub ~ s c a. .;. i. n c 


layer using Dits or tne certain order or 


a secona numoer o: 


r tne binary spectral values, with tne 


cn^ict ] i?) br 


1 -1 '^ii'^r^^ i ^ -^1 t hr r 


ond number of the 


binary spectral values, such that the 


]■ 1 T 1 i I J- -fJil t 1 1: [ ^ 


J. c r 1 


t£...Iy£ther___detera 


c 11 C f 111 


^11 11 r tct c 3 t t 1: 


2i...±lHlsLEy...lF£2.t£^^l...Z£lH!^:Si £nd__ a_J:orT[ier____£ 


L 1 r[ c j: ] I 1 c J 



- 10 - 



REPLACEMENT PAGE 



include the first sub-scaling layer and the second sub- 

L L c ) o< r ] ^1 . h, ^ii . ' : 

and the second sub-scaling layer {113a, 113b} are sepa- 
rately decodable from ea ch other. 

5 

In accordance with a second aspect of the invention, this 
ODiecc IS acnieved by an appararus for scaxaP-Le decoding an 
encoded signal comprising a first and a second sub-scaling 
layer, with the fi 

10 a certain order of a first nuiiiber of binary spectral value 
i.n a bang, with the second suo-scalmg layer corfipr.is.ing 
bits of the certain order of a second numljer of binary 
spectral values in the band, and with the second number 
0" ''g at c^-^- on*^ -.p'^ct a ^ f O" tanned i ri --n^ 

15 first number, the apparatus comprising: an extractor for 

- - ~<ui 1 . - 1 1 - ^ <= - -^^ coo^o 

- '■ L L<. ^ pi - ^ ^^r : ^.b- 
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20 termine rvhe bits of the certain order of the binary guan- 
^.^^-U ^i^c^r^^ ^_iC-> in tic. La o 
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number comprises at: least one binary spectral value, which 

L - . ■ . - ' T 1 h- ^ ] r .1 " n"o ; 

values; forming an encoded signal the step of forming in- 
cluding the first sub-scaling layer and the second sub- 
5 scaling layer into tne encoaed signai. such, r.hat r.he first 

from each other. 

In accordance with a fourth aspect of r.he invention, this 
10 object is achieved by a inethod for scalable decoding an en- 
coded signal comprising a first ana a second sub-scaling 
iayer_, witn tne first suD-scaling layer comprismcf bits of 
a certain order of a first number of binary spectral values 

15 bits of the certain order of a second number of binary 
^Ijcs 1' t'--c bci u v...oio^r th J S-c... ^ .^-"oer 
' ^ . . J' ='t e^^st •^rie s]" - ' . -^ '"' c --^-r^,- - e 
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^".c ^-^^-L ^ -.- j-n J layer and Ln^ >_-^o..j ^>^...-_c-.-i.. j 
layer so as to determine the bits of the certain order of 
the binary quantized spectral values m the bana. 

25 In accordance witn a fifr.h aspect of tne invention, this 
object is achieved Dv a computer program navmg a procfram 
code for carrying out, when the program executes on a corn- 
purer, a method of scalable encoding a spectrum, of a sig- 
nal incluaing audio and/or video inrormation, vjith tne 

30 spiectrum; comprising binary spectral values, the method com- 
prising: generating a first sub-scaling layer using bits of 
- ^-1 111 ^ -gp'^i jl .he fir si ..jno-i ^ . i 1 ii- 

< . 1 T I ' . I ^ 

equal to ,,1" and less than a totax nuraoer of trie binarv 
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'-"'-Ituj a^^t - r lit- t c ' t q " -^^ c-^ -.^ -^r- 

number of binary spectral values, the step of generating 
comprising selecting a second nunLber of the binary spectral 
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values r such that the nuiiiher is greater' than or equal to 

. ' . t o L; Lci nurl - . ^ . 

values in the band, ana determrnrna trie second numoer or 
the spec tral values further such^ that the nutriber comprises 
5 at leasr one binary spectral value, wh::.ch is nor conuainea 
m rhe first, number of binary; spectral valiaes/ and forming 
an encoded signal, the step of forming coraprismg including 
the first sub-scaling layer and the second sub-scaling 
layer into the encoaea signal such tnat the first and tne 
10 second sub-scaling layers are separately decodeable from 
each other. 

In accordance with a sixth aspect of the invention, this 
object is achieved by a com.puter prograrn having a program 

15 code for carrying out_, when the prograra executes on a coni- 
'-^--^LoJ <.£ bc^il^iblo oecod^r_^ on c-^^oJoJ ^^g^d 
'■ ^t. r -.'J =1 f-r'^^ and a se-o-g Fro-rcr;l"ia ^='ver, ' - t t'- e 
^-r,'^ suL-oCU-irg -^V'^^ co ri- i-'a C-^.. :L ^ .e-...-n oraei 
""^ ^ ^ no^r bi Id - ^ c ~ ^-s=-Q t 

20 the___second__jub~sca^^^ 

o.L.a,i of u --^jiii -.no, I ol Lin^__, ^_ „ _ v_ _ L _ 

oandf ana wherein the secona nuinj^er comprising at least one 
spectral value not contained in the first number, the 
method comprising: extracting the first sub-scaling layer 

25 fromi the encoded signal and r.he second sub-scaling layer 
fromi the encoded signal; and processing the first sub- 
scalmg layer and the secona sup-scaling layer so as to ae- 
TT-ermine rhe bits or the certain oroer or the binary quan- 
T:ized spectral vaiues m tne bana. 

30 poton^ ■ - ■ O ' P) ], 0 mrt . - . od o . ^-^loblo o . ^codi ng ■ ^roo^ " 

■i;fi aeeei=-da-f=tee----wi-tti-----ea-teri-t----ei^j-i-?ft----i-9T a:--:?iet-he-a £-e-3;= aeeiebie 

t'-J ■■ ^-n^ p -'t - - ^ rem- 

35 

The present invention is based on the idea that the psycho- 
acoustic masking effects in frequency domains occur on a 
band-wise and not on a line-wise basis, such that, by in- 
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creasing the precision of a spectral line in one band, an 
identical precision gain per band is achieved as if a regu- 
lar fractional increase in precision in the whole band 
would be carried out, which, however, is not possible in a 
5 bit-wise division of the scaling layer. In accordance with 
the invention, the refinement of the precision scaling will 
be achieved by subdividing the bit layers into sub-scaling 
layers. In contrast to the prior art, in which the bits of 
a certain order of all quantized spectral values are put 

10 together to form a scaling layer, the bits of this order 
will be inventively used in a first sub-scaling layer as a 
sub-scaling layer in the considered band only by one part 
of the quantized spectral values. The next sub-scaling 
layer will then obtain the bits of the same order, now, 

15 however, from other quantized spectral values than in the 
first sub-scaling layer so as to obtain the second sub- 
scaling layer. 

If, for example, a band with m = 4 quantized spectral val- 
20 ues is considered, then, in the state of the art, a certain 
scaling layer would include the bits of a certain order of 
all four spectral lines in the considered band. The next 
scaling layer would again include all bits of the certain 
order less 1 of all quantized spectral lines, such that, 
25 from scaling layer to scaling layer, a precision gain per 
spectral line of 6 db will result. 

In accordance with the invention, the determined scaling 
layer will now be subdivided into a maximum of m sub- 

30 scaling layers. The first sub-scaling layer would then only 
include the bit of a certain order of the first spectral 
line and no bits of the second, third and fourth spectral 
line. The second sub-scaling layer would then include the 
bit of a certain order of the second quantized spectral 

35 line, however, no bit for the first, third, and fourth 
spectral line. In a similar manner, the third sub-scaling 
layer will include the bit of a certain order of the third 
spectral line, and the fourth sub-scaling layer will in- 
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elude the bit of a certain order of the fourth spectral 
line of the considered band. As has been set forth, since 
masking effects will occur on a band-wise and a non-line- 
basis, each additional sub-scaling layer will provide a 
5 precision gain of 6/m db. This means that, in the consid- 
ered example m = 4, each sub-scaling layer will result in a 
precision gain of 1.5 db. 

It should be appreciated that, in a sub-scaling layer, the 
10 bits of the certain order of more than one quantized spec- 
tral line may be present as well. In the considered exam- 
ple, if a sub-scaling layer would include the bits of a 
certain order of two quantized spectral lines, the preci- 
sion gain per sub-scaling layer would no longer be 1.5 db, 
15 but 3.5 db. Generally speaking, the second number of quan- 
tized spectral values, from which bits are present in the 
second sub-scaling layer, are selected such that they are 
greater or equal to 1 and less than the total number of 
quantized spectral values in the band, with the second num- 
20 ber of spectral values further comprising at least the bits 
of the certain order of a quantized spectral value, which 
is not present in the first number of quantized binary 
spectral values, the bits of which are present in the first 
sub-scaling layer. 

25 

In accordance with the invention there exists a selection 
of various possibilities, as to which of the spectral val- 
ues is to be selected for the next sub-scaling layer. If 
the masking threshold of hearing is for example presented 
30 in lines (for example, more precise than in 6-db-steps) , it 
is possible to exactly ascertain in the encoder, which of 
these m spectral lines has so far been the least precise. 

In contrast, if the masking threshold of hearing is repre- 
35 sented on a band-wise basis (for example, in 6-db-steps), 
at the beginning of the encoding of a new layer, that is 
when generating a sub-scaling layer for a new bit layer, 
each spectral line is to be transmitted with the same pre- 
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cision relative to the masking threshold of hearing. When 
selecting the line order in the sub-layers, the values of 
the spectral lines, however, which have so far been trans- 
mitted, permit to be considered. For example, if the spec- 
5 tral lines with small spectral value are encoded first in 
the following sub-layers, a more precise spectral formation 
of the resulting quantizing errors will result. 

In a preferred embodiment of the present invention, sub- 
10 scaling layers will be formed using psycho-acoustically 
quantized spectral values, with the certain order of the 
bits being processed in the sub-scaling layers being con- 
stant above the considered band comprising m spectral 
lines. In the case of psycho-acoustically quantized binary 
15 spectral values, for a psycho-acoustically transparent en- 
coding, all bits of the quantized spectral values have to 
be transmitted. In this case, especially with the low order 
bits of the binary quantized spectral values, a finer scal- 
ability is advantageous so as to enable a decoding with a 
20 slowly decreasing quality depending on the number of con- 
sidered sub-scaling layers. 

In an alternative embodiment of the present invention, the 
quantized spectral values are not quantized in considera- 

25 tion of psycho-acoustic aspects, but are available within 
the framework of the computing accuracy of a computer prior 
to quantizing. Alternatively, the quantized spectral values 
have been generated using an integer MDCT, which is de- 
scribed in "Audio Coding Based on Integer Transforms", 111 

30 AES Convention, New York, 2001, Geiger, Herre, Roller, 
Brandenburg . 

The IntMDCT is especially favourable, since it comprises 
the attractive properties of the MDCT, such as, for exam- 
35 pie, a good spectral representation of the audio signal, a 
critical sampling and a block overlapping. As has been set 
forth, the IntMDCT is a lossless transform, that is round- 
ings to integer values during the forward transform may be 
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considered by an inverse rounding operation in the backward 
transform, so that no rounding errors will occur. 

The IntMDCT spectral values are therefore present in loss- 
5 less form, that is, they were not quantized in considera- 
tion of psycho-acoustic aspects. 

For a scaling operation with respect to the psycho-acoustic 
masking threshold, it is preferred to determine at least 

10 the most significant bit of the psycho-acoustic masking 
threshold for each spectral value and/or for each band and 
to no longer establish the certain order of bits which are 
to get into a scaling-layer and/or into a sub-scaling layer 
- in an absolute manner - as was the case in the psycho- 

15 acoustically quantized spectral values - but relative to 
the corresponding most significant bit of the psycho- 
acoustic masking threshold. The certain order for the bits 
in a scaling layer is therefore defined relative to the 
psycho-acoustic masking threshold, for example, in that the 

20 bits of the spectral values are to be encoded in a scaling 
layer, which, for example, comprises an order that is by 1 
greater than the MSB of the psycho-acoustic masking layer 
for the corresponding spectral value and/or - in a band- 
wise provision of the psycho-acoustic masking threshold - 

25 for a band, in which the spectral value is located. The 
certain order for defining the scaling layers in the case 
of spectral values, which have not been quantized in con- 
sideration of psycho-acoustic laws, is therefore a relative 
order related to the MSB of the psycho-acoustic masking 

30 threshold, which is relevant for the respective spectral 
value . 

In accordance with the present invention, for a psycho- 
acoustic transparent encoding/decoding, it is preferred to 
35 transfer all bits of the quantized spectral values in indi- 
vidual scaling layers or sub-scaling layers comprising the 
same order as the MSB of the psycho-acoustic masking 
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threshold or the order of which is higher than the order of 
the MSB of the psycho-acoustic masking threshold. 

In particular, when defining the scaling layer, which is to 
5 include the bits of the quantized spectral values, which 
comprise the same order as the most significant bits of the 
psycho-acoustic masking layer, it is preferred to carry out 
a classification into sub-scaling layers so as to achieve a 
better precision scaling so to say at the limit of audibil- 

10 ity of interferences. If, for example, the total frequency 
domain or a part of the frequency domain is subdivided into 
bands of, for example, four spectral values each and if 
there is always one spectral value of all resulting bands 
transmitted in a sub-scaling layer, a precision increase of 

15 1 . 5 db may achieved with each sub-scaling layer. 

It should be appreciated that the precision scaling is 
freely selectable by setting the size of the bands. If, for 
example, eight spectral values are grouped into a band and 
20 if each sub-scaling layer contains only the bit from a 
spectral value from this band, a precision scaling of 
0.75db will be achieved. 

One advantage of the inventive concept of sub-dividing a 
25 scaling layer into several sub-scaling layers, which, how- 
ever, may be extracted and decoded independently of each 
other, consists in that it is compatible with all other ex- 
isting scalability options. As an example, mention should 
be made of the band width scaling, in which for the acous- 
30 tically adapted encoding of audio signals at low bit rates, 
a reduction of the audio band width is mostly carried out 
so as to be able to represent the remaining spectral values 
with a sufficient precision. This channel-dependent band- 
width scaling, for example, may also be implemented in the 
35 inventive context of using sub-scaling layers. To this end, 
in the first layers, only a frequency domain with an upper 
limit will be considered, and, with increasing accuracy in 
the further layers and/or sub-layers, higher frequency do- 
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mains, which so far have not been considered, will be en- 
coded on a step-wise basis. 

A further advantage of the inventive concept of the sub- 
5 scaling layers consists in that it is also compatible with 
the context-dependent arithmetic encoding, which is also 
used in MPEG-4 BSAC. MPEG-4 BSAC is described in "Coding of 
Audio Visual Objects, Audio", International Standard 14496- 
3, 2"'* edition, ISO/IEC Moving Pictures Expert Group, 
10 ISO/IEC JTCI/SC29/WG11, 2001, 

The inventive concept is further advantageous in that, on 
the side of the decoder, any interpretation of the quan- 
tized value may be carried out. If not all of the bit lay- 

15 ers of the spectrum are transmitted, for each spectral 
value only the high-order bits will be available in the de- 
coder. Moreover, in view of the masking threshold of hear- 
ing transmitted in a preferred embodiment of the present 
invention and in view of the number of transmitted bit lay- 

20 ers, it is possible to determine how many bits of this 
spectral value have not been transmitted. From this data 
the decoder has to reconstruct a quantized spectral value. 
A plausible possibility for this would be to replace the 
non-transmitted bits by zeroes. Thus, by eliminating the 

25 scaling-layers, the quantizing process will always result 
in a rounding towards smaller absolute values. This type of 
quantizing, however, will result to the smallest possible 
mean quantizing errors. The mean quantizing error may be 
reduced in this kind of quantizing by making use of alter- 

30 native decoder reconstruction strategies. 

Brief Description of the Dravjings 

35 Preferred embodiments of the present invention will be ex- 
plained below with reference to the attached drawings, in 
which: 
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Fig. la shows a block diagram of the inventive encoder; 

Fig. lb shows a schematic representation of a scaled en- 
coded signal with scaling layers and sub-scaling 
5 layer; 

Fig. 2 shows a sub-division of a magnitude spectrum in 
bit layers in parallel to the masking threshold 
of hearing; 

10 

Fig. 3 shows a schematic representation of the sub- 
division of Fig. 2 in consideration of the MSB of 
the masking threshold; 

15 Fig. 4 shows a schematic representation for illustrating 
the selection of a spectral value for the next 
sub-scaling layer in a continuously given masking 
threshold of hearing; 

20 Fig. 5 shows a schematic representation for illustrating 
the selection of a spectral value for a sub- 
scaling layer in a band-wise representation of 
the masking threshold of hearing; 

25 Fig. 6 shows a detailed block diagram of an inventive 
encoder; 

Fig. 7 shows a block diagram of an inventive decoder 
with IntMDCT; 

30 

Fig. 8 shows a block diagram of a prior art BSAC- 

encoder; 

Fig. 9 shows a block diagram of a prior art BSAC de- 
35 coder; 

Fig. 10a shows a schematic block diagram of a prior art 
encoder with MDCT and 50% overlapping; 
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Fig. 10b shows a block diagram of a prior art decoder for 
decoding the values generated by Fig. 10a; 

5 Fig. 11 shows a block diagram of a preferred means for 
processing time discrete audio sampled values so 
as to obtain integer values, from which integer 
spectral values may be averaged out; 

10 Fig. 12 shows a schematic representation of the decompo- 
sition of an MDCT and an inverse MDCT in Givens 
rotations and two DCT-IV-operations; and 

Fig. 13 shows a representation for illustrating the de- 
15 composition of the MDCT with a 50% overlapping in 

rotations and DCT-IV-operations. 



20 

Fig. la shows a schematic block diagram of an apparatus for 
scalable encoding a spectrum of a signal including audio 
and/or video information, with the spectrum comprising bi- 
nary spectral values being grouped into bands. A band of 

25 binary spectral values of the audio and/or video signal 
will be fed into an input 100 of the apparatus for scalable 
encoding of Fig. la. The grouping of binary spectral values 
in bands may be effected in any given manner. As has been 
set forth, the present invention is based on the fact that 

30 masking effects in the frequency domain occur on a band- 
wise basis and not spectral value-wise basis. For this rea- 
son it is preferred to carry out the grouping of binary 
spectral values in bands using, for example, the frequency 
groups (critical bands) or using bands which are smaller 

35 than the frequency groups, that is, which include less 
spectral values than a frequency group, such that a psycho- 
acoustic or psycho-optical frequency group is grouped into, 
for example, two or more bands. 
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A band of binary spectral values of the audio and/or video 
signal will be fed into an input 102 for generating the 
sub-scaling layers, with the means 102 for generating the 
5 sub-scaling layers generating a first sub-scaling layer, a 
second sub-scaling layer and, if necessary, further sub- 
scaling layers. The sub-scaling layers will be output to 
output lines 104a, 104b... from the means 102 and transmitted 
to a means 106 for scaling the encoded signal, with the 

10 means 106 for forming the encoded signal being implemented 
so as to include the first sub-scaling layer (TSS) and the 
second sub-scaling layer into the encoded signal at an out- 
put 108 at the apparatus shown in Fig. la, such that the 
first and the second sub-scaling layer may be decoded sepa- 

15 rately from each other. 

The means 102 for generating the sub-scaling layers oper- 
ates using bits of a certain order of a first number of bi- 
nary spectral values in a band, which the first number be- 

20 ing greater or equal to 1 and less than a total number of 
binary spectral values in the band. ?or generating the sec- 
ond sub-scaling layer, the means 102 uses bits of a certain 
order of a second number of binary spectral values, with 
the second number of binary spectral values being selected 

25 such that it is greater or equal to 1 and less than the to- 
tal number of the binary spectral values in the band, and 
with the second number of binary spectral values being de- 
termined such that they comprise at least one binary spec- 
tral value which is not included in the first number of bi- 

30 nary spectral values. This means, that each sub-scaling 
layer, if it has been decoded, will result in that at least 
one spectral value of the considered band in the decoder is 
present at a higher precision than if this sub-scaling 
layer would not have been taken into consideration. 

35 

Fig. lb shows a scaled encoded signal as a schematic bit 
stream presentation. The bit stream representing the scaled 
encoded signal first includes side information 110, which 
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may be implemented as specified by the BSAC standard. The 
bit stream then includes a first scaling layer 111, a sec- 
ond scaling layer 112 and a third scaling layer a 
fourth scaling layer 114, a fifth scaling layer 115... As an 
5 example only, in the scaled/encoded signal shown in Fig. 
lb, the third scaling layer 113 is subdivided into four 
sub-scaling layers (SSL) designated with 113a to 113d. 
Moreover, as an example only, the fifth scaling layer is 
also subdivided into sub-scaling layers, that is in the 
10 sub-scaling layers 115a, 115b, 115c... 

The first scaling layer 111 includes for example the bits 
of the highest order, either absolute or, as has been set 
forth, relative to the psycho-acoustic masking threshold - 
15 of the spectral values of the spectrum of the audio and/or 
video signal. As a complete scaling layer, the second scal- 
ing layer 112 also includes the bits of the spectral values 
with an order that is lower by 1. 

20 In total, the third scaling layer includes the bits of an 
order of the spectral values that is lower by 2, however, 
not as a complete scaling layer, which may only be com- 
pletely decoded, but - for a finer precision scaling - is 
subdivided into four decidable sub-scaling layers 113a, 

25 113bv, 113c, 113d, which are separate from each other. In 
the example represented in Fig. lb, the total spectrum, 
that is the total number of spectral values, is subdivided 
into bands of four spectral values each. The first sub- 
scaling layer 113a then includes the bit of the order of 

30 one spectral value each in one of the bands, the order be- 
ing is lower by 3. As an analogy to this, the second sub- 
scaling layer includes the bits of the same order, however, 
from other spectral values in the individual bands. The 
third sub-scaling layer 113c includes the bits of the same 

35 order, however, again from other spectral values in a band. 
The same applies for the fourth sub-scaling layer. If bands 
were selected, which include four spectral values each, 
each sub-scaling layer has one bit of a spectral value for 



REPLACEMENT PAGE 



each band. This means, that each sub-scaling layer in the 
example represented in Fig. lb comprises information of a 
quarter of the number of bits, like a complete scaling 
layer, such as for example the first scaling layer 111 or 
5 the second scaling layer 112. 

In the following, a subdivision of the magnitude spectrum 
in bit layers in parallel to the masking threshold of hear- 
ing will be represented in Fig. 2. The spectral values rep- 

10 resented by their bit pattern in Fig. 2 are spectral values 
as are obtained, for example, by the IntMDCT, which will be 
explained in detail hereinbelow. The binary spectral values 
represented by means of their bit pattern in Fig. 2 may 
also be the results of any time domain/frequency domain 

15 translation algorithm, such as for example an FFT, and 
which are represented as binary integers of principally any 
size. The binary spectral values represented in Fig. 2 have 
thus not yet been quantized using psycho-acoustic aspects. 

20 Further, in Fig. 2, the psycho-acoustic masking threshold 
of hearing is plotted as a continuous line designated at 0 
db. 

From the course of the masking threshold of hearing in the 
spectrum, bit layers - running in parallel to the masking 

25 threshold of hearing - will result, with the membership of 
a bit to a bit layer reflecting the psycho-acoustic/or psy- 
cho-optical relevance of this bit. For example, from Fig. 2 
it may be seen that the spectral value designated at 1 com- 
prises bits, which occupy two bit layers above the masking 

30 threshold of hearing. In contrast, the even greater spec- 
tral value 5 is characterized in that it comprises higher 
order bits occupying three bit layers above the masking 
threshold of hearing. The spectral values 2, 3, and 4, in 
contrast, only include bits lying in a bit layer below the 

35 masking threshold of hearing. 

With respect to the psycho-acoustic transparency, that is 
the audibility of interferences on the basis of a quantiza- 
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tions and/or on the basis of "leaving out" low-order bits, 
the masking threshold of hearing will be referred to as the 
0-db-line. The psycho-acoustically most significant bit 
layer, and thus the first scaling layer in the example 
5 shown in Fig. 2, is the bit layer between 12 db and 18 db. 
Here, only the spectral value with the number 5 provides a 

contribution. The first scaling layer ■!■&■? 1 11 f rom Fig. lb 

would therefore include only information on the spectral 
value 5 in the example shown in Fig. 2. 

10 

The second bit layer between 6 db and 12 db, that is the 
second scaling layer 112 from Fig. -S-lb, only includes in- 
formation on bits of the first spectral value and of the 
fifth spectral value, however, no information on the other 
15 spectral values, as their MSBs range in lower bit layers. 

In the example shown in Fig. 2, the third t'> - i"te" - sca ling layer 
113 from Fig . lb includes the bits between the 0-db-line 
and the +6-db-line in Fig. 2 and now includes only informa- 

20 tion on the sixth, the fifth, and first spectral line, how- 
ever, still no information on the other spectral values. If 
now the third scaling layer in the example of Fig. 2 would 
be processed as a complete scaling layer, the precision 
graduation from the second scaling layer to the third scal- 

25 ing layer would be very intense in that a decoding of only 
the first and second scaling layer - without the third 
scaling layer - would lead to strong audible interferences. 
In contrast, considering the third scaling layer would 
hardly result in any audible interferences . In accordance 

30 with the invention graduation in this bound is achieved in 
that the sub-scaling layers of the third scaling layer are 
formed, where, in the situation shown in Fig. 2, despite a 
band division for example in m = 5, only two sub-scaling 
layers would suffice, while a first sub-scaling layer would 

35 include the second order bit of spectral value No. 1, while 
a second sub-scaling layer would include the third order 
bit of spectral value No. 5, with these bits in the sub- 
scaling layers for spectral value No. 1 and spectral value 
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No. 5 having the same order relative to the least signifi- 
cant bits of the masking threshold. 

For elucidating these facts, reference will be made to Fig. 
5 3 below. Fig. 3 shows a detailed representation of the 
situation in Fig. 2, with the masking threshold of hearing 
no longer being plotted by means of its actual value, as is 
illustrated in Fig. 2, but in Fig. 3 is represented with 
respect to its most significant bits. 

10 

In accordance with the invention, it has been found out 
that for a psycho-acoustic transparency, in order to avert 
any unfavourable instances, so many bits of a quantized 
spectral value have to be transmitted such that the order 

15 of the lastly transmitted bit corresponds to the order of 
the most significant bit of the masking threshold associ- 
ated with this spectral value. Expressed in other words, 
this means that all bits from a spectral value - provided 
the same exist - , which comprise a higher order than the 

20 MSB of the masking threshold associated with this spectral 
value have to be transmitted, and that further also the bit 
of the spectral value comprising the same order as the MSB 
of the masking threshold, is to be transmitted. 

25 The inventive precision scaling with especially interesting 
with respect to the psycho-acoustic masking threshold, that 
is, for the bits of spectral value having the same order as 
the MSB of the masking threshold, which is associated to 
the spectral value. In the diagram shown in Fig. 3 these 

30 bits are plotted as bold-edged boxes . 

Generally speaking, the bit order is plotted in a vertical 
direction in Fig. 3, meaning from MSB over MSB -1, MSB -2, 
MSB -3, LSB +2, LSB +1 to LSB . However, the expression 
35 "MSB" in Fig. 3 does not designate the MSB of a certain 
spectral value or of a psycho-acoustic masking threshold, 
but the absolute MSB, that is the maximum representable 
power of two in the binary system. 
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In contrast, in the bold-edged boxes represented in Fig. 3, 
the MSB of the masking threshold of hearing is represented 
for a spectral value of 1 to 6. In particular, each box is 
5 subdivided by a dotted diagonal, with a bit of a spectral 
value being above the diagonal, while below the diagonal 
there is a bit of the masking threshold for this spectral 
value. Bits designated at "1" have the value of 1. Bits 
designated at "zero" have the value of "0". Finally, bits 

10 designated with "x" have the value "0" or "1". The first 
scaling layer and/or first bit layer in the example shown 
in Fig. 3 thus includes the bit MSB of the spectral value 
5, the bit "MSB -1" of the spectral value 4, the bit "MSB - 
2" of the spectral value 3, the bit "MSB -1" of the spec- 

15 tral value 2 and the bit MSB of the spectral value 1. The 
certain order of the bits in the first scaling layer is 
therefore by 3 greater than the order of the bit in which 
the MSB of the masking threshold is located. 

20 The second scaling layer would then include the bits (MSB - 
1), (MSB -2) (MSB -3), (MSB -2) and (MSB -1) for the spec- 
tral values of 5, 4, 3, 2, and 1. The third scaling layer 
would then include the bits (MSB -2), (MSB -3), (LSB +2), 
(MSB -3), and (MSB -2) again for the spectral values 5, 4, 

25 3, 2, and 1. The fourth scaling layer, which is preferred 
to be divided into sub-scaling layers, would then include 
the bold-edged bits from Fig. 3, that is (MSB -3), (LSB 
+2), (LSB +1), (LSB +2), and (MSB -3) again for the spec- 
tral values 5, 4, 3, 2, and 1. A transmission of the first, 

30 second, third, and fourth scaling layer results in a psy- 
cho-acoustic transparency, while, if the fourth scaling 
layer would be left out, a precision loss of 6 db will be 
obtained. 

35 In accordance with the invention, the fourth scaling layer 
is sub-divided, for example, into five sub-scaling layers, 
where in each sub-scaling layer, a spectral value bit for a 
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spectral value will be provided in the band comprising five 

spectral values . 

Each sub-scaling layer thus provides a precision increase 
5 of 6 db/(m = 5) = 1.5 db. 

In order to be able to trace the course of the bit layers 
in a decoder in the embodiment represented in Fig. 3, the 
masking threshold of hearing and/or the course of the only 
10 just psycho-acoustically significant bit, that is the MSBs 
of the masking threshold of hearing, is transmitted within 
the side information 110 from Fig. lb to the decoder. 

For this purpose, two alternatives are preferred. These in- 
15 elude the lines-wise representation and the band-wise rep- 
resentation . 

Owing to its continuous course, the masking threshold of 
hearing maybe efficiently represented in a line-wise repre- 
20 sentation by the frequency response of an FIR filter with 
few coefficients or by a polynomial interpolation. Here, 
for each frequency response an individual value of the 
masking threshold of hearing is generated. 

25 In band-wise representation, reference is made to the fact 
that the psycho-acoustic masking effects, which are based 
on the masking threshold of hearing, may be expressed on a 
band-wise basis, with the band division may be in accor- 
dance with the Bark-scale and preferably represents a re- 

30 finement of the Bark-scale. This band-wise representation 
is also used in prior art methods for an acoustically 
adapted audio encoding, such as, for example, MPEG-2 AAC . 
For representing the masking threshold of hearing it is 
thus sufficient to transmit one value per band. 

35 

As has already been set forth. Fig. 2 and Fig. 3 represent 
the definition of bit layers of an identical psycho- 
acoustic significance, for example, in the IntMDCT spec- 
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trum. As has been set forth, the bits are encoded on a 
layer-by-layer basis, starting from the highest layer, and 
transmitted. Upon reaching the bit layer corresponding to 
the masking threshold of hearing (the bold-edged bits in 
5 Fig. 3), the transmitted signal is psycho-acoustically 
transparent. The transmission of further bit layers, that 
is from bits below the bold-edged boxes represented in Fig. 
3, increases the precision and thus the safety distance to 
the masking threshold of hearing. Finally, if all available 
10 bits are transmitted, the method operates on lossless ba- 
sis. As has been set forth, an arithmetic encoding is pref- 
erably used for redundancy reduction of the transmitted 
bits . 

15 The refinement of the precision scaling of the basis of the 
inventively used sub-scaling layers, which may be processed 
separately from each other in the decoder, is especially 
advantageous in the area above the masking threshold of 
hearing, on the masking threshold of hearing and below the 

20 masking threshold of hearing (related to the order of the 
MSB of the masking threshold of hearing) . Without any pre- 
cision scaling, an increase in precision by 6 db will re- 
sult in a layer-wise transmission of the bits of the 
IntMDCT spectrum. If, however, one considers, that at least 

25 in noisy signals, the masking threshold of hearing ranges 
only approximately 6 db below a signal, it is obvious that 
a scaling of the precision in 6-db-steps is often too 
coarse for an efficient encoding of the only just audible 
signal portions. 

30 

The subdivision in 1.5-db-steps described above, if bands 
with four spectral values are used and if one single spec- 
tral value is arranged in each sub-scaling layer, or if, 
for example, bands with eight spectral values are used and 
35 two spectral values are considered in each sub-scaling 
layer, a correspondence as regards the adaptation of preci- 
sion should be in 1.5-db-steps, which is also present in 
MPEG-2 AAC . Here, a band-wise adaptive quantization of con- 
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tinuous spectral values is effected by means of scaling 
factors of the form 2°"'^^ with n assuming integer val- 

ues. If n is increased by 1, the precision of the quantiza- 
tion will change at MPEG-2 AAC by 1.5 db . 

5 

The inventive concept provides this refinement of the pre- 
cision scaling by subdividing the bit layers in sub-scaling 
layers, with m sub-layers of one layer being obtained by 
sub-dividing m adjacent lines each on m sub-layers. With 

10 each newly transmitted sub-layer the precision will in- 
crease by 6/m db. The m = 4 a graduation in 1.5-db-steps is 
also possible. In contrast to the above-described quantiza- 
tion in the MPEG-2 AAC method, the precision in each sub- 
layer is increased only for one of m spectral lines in the 

15 inventive concept. Since the psycho-acoustic masking ef- 
fects occur in the frequency domain on a band-wise and not 
on a line-wise basis, the same precision gain per band is 
obtained by increasing the precision of a spectral line as 
when regularly increasing precision in the whole band. 

20 

From Figs. 4 and 5 detailed reference is made to the best 
modes of selecting which of the m spectral lines in the 
next sub-layer will be refined. 

25 Fig. 4 shows a case, which the masking threshold of hearing 
is represented on a line-wise basis. The masking threshold 
of hearing is plotted as a continuous line. The MSB of the 
masking threshold of hearing is plotted above in the form 
of a "cross". The decoding of all scaling layers lying 

30 above, which are not represented in Fig. 4, has already 
been completed such that the spectral values 1, 2, 3, and 
4, are present with a precision represented by "0". The 
previously transmitted bit represented by "0" therefore 
represents the precision of the spectral line in the de- 

35 coder. By comparing the value of the previously processed 
spectral value in the encoder to the value of the masking 
threshold of hearing for this spectral value, it becomes 
immediately apparent which spectral value has previously 
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been transmitted in the least imprecise manner. In the ex- 
ample shown in Fig. 4, as may easily be seen from Fig. 4, 
this includes the spectral value 2. The first sub-scaling 
layer will therefore obtain the next bit of the spectral 
5 value No. 2. 

The next spectral value for the second sub-scaling layer is 
the spectral value No. 4. Then the spectral value No. 1. 
for the third sub-scaling layer should follow and finally 
10 the spectral value No. 3 for the fourth sub-scaling layer. 

The next bit to be coded will therefore be determined from 
the frequency line with the greatest difference between the 
precision of the previously processed spectral value and 
15 the masking threshold of hearing. 

It should be appreciated that this process in the decoder 
may be inverted such that the decoder is able to find out, 
without any additional side information, which spectral 
20 value will be further refined by the sub-scaling layer to 
be decoded next, as long as the decoder knows the continu- 
ous course of the psycho-acoustic masking threshold. 

Fig. 5 shows the case of the band-wise representation of 
25 the masking threshold of hearing. From Fig. 5 it may be 
seen that the bits of the spectral values 2, 3, and 4 may 
be considered as sub-scaling layers to be processed next, 
since as compared to the masking threshold of hearing, they 
are spaced from the same by the greatest distance. In con- 
30 trast to that, the value of the spectral value 1 is already 
positioned close to the masking threshold of hearing, so 
that the spectral value 1 does not necessarily have to be 
refined, but the spectral values 2, 3, and 4 have to. 

35 In principle, each of the spectral values 2, 3, 4 could be 
considered in the next sub-scaling layer. However, noise 
shaping may be achieved in that the absolute value of the 
spectral values 2, 3, and 4, as have already been processed 
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in the encoder and/or in the decoder, is considered. For 
example, if it turns out that, for example, six higher- 
order bits have already been transmitted for spectral value 
No. 2, indicating that spectral value No. 2 is very large, 
5 this means, in relative terms, that this spectral value is 
already represented in a fairly precise manner. If, in con- 
trast, it is found out that spectral value No. 3 is a 
smaller spectral value in that only, for example, one sin- 
gle higher-order bit has been transmitted, first the spec- 

10 tral value with No. 3, as it is preferred in accordance 
with the invention, will be processed in a sub-scaling 
layer and then spectral value No. 2. This recognition is 
based on the fact, that it is assumed that the relative 
precision for the hearing impression is more significant 

15 than the absolute precision. 

Fig. 6 shows a total block diagram of an inventive encoder. 
The time signal will be fed to an input 600 in the encoder 
and, for example, by the means of an IntMDCT 602 translated 

20 into the frequency domain. Parallel to this, the psycho- 
acoustic model 8_4 is in operation, which may principally 
comprise the same structure as the psycho-acoustic model8 4, 
which is represented in Fig. 8. The masking threshold, 
which is calculated by the psycho-acoustic model 84, will 

25 now, as in Fig. 8, not be used for quantizing, but for de- 
fining 604 of scaling layers. In particular, in a preferred 
embodiment of the present invention, the means 84 provides 
the MSB of the masking threshold either on a per-spectral- 
value or a per-band-basis, in order to so to say determine 

30 the bold-edged boxes represented in Fig. 3. The means 604 
then defines the scaling layers relative to the order of 
the MSBs of the masking threshold (of the bold box in Fig. 
3) . 

35 The means 604 for defining scaling layers controls the 
means 132 102 from Fig, la for generating sub-scaling lay- 
ers and/or for generating scaling-layers , if both scaling 
layers and sub-scaling layers are to be employed. In the 
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embodiment shown in Fig. 3, the means 102 would operate 
such that it would generate three complete sub-scaling lay- 
ers and feed the same to a means 606 for arithmetically en- 
coding, and then, for the fourth layer concerning the bits 
5 of the spectral values, the order of which equals to the 
order of the MSBs of the masking threshold, would subdivide 
them into a certain number of sub-scaling layers. After the 
arithmetical encoding of the sub-scaling layers, the scal- 
ing layers and the sub-scaling layers will be generated in 
10 a bit stream by a bit stream formation means 608 so as to 
obtain a scaled/encoded signal, which may principally com- 
prise the structure shown in Fig. lb. 

The scaled/encoded signal will be fed into an input 700 of 
15 a decoder shown in Fig. 7, with a means 702 deformatting 
the bit stream shown in Fig. lb so as to separate the side 
information from the sub-scaling layers, etc. An extrac- 
tion/encoding means 704 will then successively conduct an 
arithmetical encoding of the scaling layers and the sub- 
20 scaling layers, such that, in a memory not shown in Fig. 7, 
which is located on the decoder side, the bit patterns of 
the individual spectral values can build up one after the 
other. 

25 Depending on the number of the transmitted scaling layers 
and/or depending on the control signal at a control input 
in the means 704, the decoder will sometime cease to decode 
further scaling layers or sub-scaling layers. If all scal- 
ing layers and sub-scaling layers generated on the encoder- 

30 side have been transmitted and decoded in the bit stream, a 
lossless encoding/transmission/decoding will have taken 
place, and the decoder does not have to conduct any inter- 
pretation of quantized values. The obtained spectral values 
subsequent to a lossless or almost lossless encod- 

35 ing/transmission/decoding will be fed to an backward trans- 
formation means 706, which, for example, carries out an in- 
verse IntMDCT (IntMDCT"^) , so as to obtain a decoded signal 
at an output 708. If, for example, scaling layers or sub- 



REPLACEMENT PAGE 



scaling layers determined on the basis of the transmission 
channel were cut off or if the decoder, due to its struc- 
ture, was not able to process all scaling layers or sub- 
scaling layers, or if the means 704 was controlled so as to 
5 process only a certain number of scaling layers and/or sub- 
scaling layers, the inventive decoder will carry out an in- 
terpretation of the previously available spectral value bit 
pattern. If not all bit layers of the spectrum are trans- 
mitted, only the higher-order bits will be available for 
10 each spectral value in the decoder. 

Being aware of the masking threshold of hearing and the 
number of bit layers generated in total in the decoder for 
the lossless case and/or which may be generated in total, 

15 the decoder now determines how many bit layers - and thus 
how many bits - have not been transmitted for each individ- 
ual spectral value. From these data, the decoder constructs 
a quantized spectral value. The easiest approach for this 
consists in that the non-transmitted bits are replaced by 

20 zeroes. In this case, the quantizing process will always 
result in a rounding towards smaller absolute values. 

In accordance with the invention, it is preferred to keep 
the mean quantizing error as small as possible. This is 

25 achieved by using a so-called "Uniform Midrise Quantizer", 
as is described in N. S. Jayant and P. Noll:" Digital cod- 
ing of waveforms", Prentice-Hall, 1984. This quantizer 
leaves the quantizing interval used in quantizing un- 
changed, but shifts the quantized value, that is the repre- 

30 sentative of the quantizing interval and thus the interpre- 
tation of the transmitted bits by a certain value. A shift 
towards the centre of the quantizing interval is achieved, 
for example, by using the bit pattern "1 0 0 0..." for the 
missing bits. For the missing low-order bits of a spectral 

35 value it is generally preferred to use bit patterns in the 
quantizer for reconstruction, which distinguish themselves 
from the "rounding bit pattern", which is represented by "0 
0 0 ..." . In other words, this means that the reconstruction 
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bit pattern includes at least one "1", and preferably that 
the most significant bit of the reconstruction bit pattern 
is a "1". 

5 In the following, detailed reference is made to the func- 
tionality of the encoder shown in Fig. 6 and the decoder 
shown in Fig. 7, which, as a preferred transform algorithm, 
include the IntMDCT. The IntMDCT-spectrum provides a spec- 
tral integer representation of the audio signal. Parallel 

10 to this, the psycho-acoustic model in the encoder shown in 
Fig. 6 calculates the masking threshold of hearing. The 
masking threshold of hearing, as has been set forth, can be 
efficiently encoded due to the continuous course and may be 
transmitted in the bit stream, for example, by coefficients 

15 of an FIR filter or by a polynomial interpolation. 

For each spectral line the number of bits, which are not 
significant in terms of psycho-acoustics, that is, the bits 
of the spectral values, the order of which is less than the 
20 order of the MSB of the masking threshold of hearing for 
this spectral value, will result from the masking threshold 
of hearing. Relating to Fig. 3, these are the bits below 
the bold-edged boxes . 

25 Each magnitude value of the integer spectral values is rep- 
resented on a bit-wise basis, so as to define, by means of 
means 604, bit layers of an identical psycho-acoustic sig- 
nificance along the frequency domain, , for example, in par- 
allel to the layer of the still psycho-acoustically sig- 

30 nificant bits, with a preference of low-order frequencies 
in the more significant layers being preferred. The bits 
will be ordered along the significance layers, starting 
with the most significant bit. The start layer either re- 
sults from the theoretical maximum values or from an effi- 

35 ciently encoded spectral envelope, analogue to the encoded 
masking threshold of hearing, or from a parallel displace- 
ment of the masking threshold of hearing, such as, for ex- 
ample, by 30 db, which would correspond to 5 bits. 
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An occurrence of a "1" in the layers of a high significance 
is very unlikely, since only few spectral lines protrude 
far from the masking threshold of hearing, such as, for ex- 
5 ample, spectral line 5 from Fig. 2 or Fig. 3. Towards the 
lower layers, the probability of meeting a "1" increases 
and approximates 50 %. Upon a bit sequence arranged in this 
way, it is preferred to apply a bit-wise arithmetical en- 
coding for redundancy reduction. 

10 

In an aspect of the present invention, the scalability 
area, as in MPEG-4 BSAC, is not only extended as far as the 
psycho-acoustic transparency, but as far as lossless encod- 
ing/decoding. If the total encoded bit sequence and, with a 

15 corresponding representation, also the pertaining signs of 
the spectral values are transmitted, the embodiment will 
operate on a lossless basis. With only a part of the en- 
coded bit sequence transmitted, this will already result in 
an irrelevance reduction. If the encoded bit sequence is 

20 transmitted as far as the layer of the only just signifi- 
cant bits, the method operates only just in the transparent 
mode. If less bits are transmitted, a reduction of the bit 
rate will result, which also results in a reduction of the 
audio/video quality. 

25 

If, in addition to these psycho-acoustic significant lay- 
ers, further layers are transmitted, the audio signal 
(video signal) will be represented with an additional 
safety distance to the masking threshold and thus enables 
30 an almost lossless representation with a great robustness 
as against post-processing steps. 

The number of the needed bits for achieving a transparency 
varies from block to block. If this information is encoded 
35 in the complete lossless bit stream, this information may 
be used for controlling the bit allocations for achieving a 
constant bit rate. This information is exactly available 
and may be used for any desired constant bit rate. Thus, 
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from the complete lossless encoded bit stream, an acousti- 
cally adapted encoded sub-bit stream may be taken for each 
specified constant bit rate, the former using the function- 
ality of the locally varying bit rate. 

5 

Finally, the transmission of the bit layers in the side in- 
formation, which are required for achieving a transparency, 
enables a control of the current audio quality transmitted 
in the sub-bit stream by comparing this value with the num- 
10 ber of the actually transmitted bit layers. 

As an example for an integer transform algorithm, the fol- 
lowing refers to the IntMDCT transform algorithm, which is 
described in "Audio Coding Based on Integer Transforms" 
15 111^*" AES convention. New York, 2001. The IntMDCT is espe- 
cially favourable, since it provides the most attractive 
properties of the MDCT, such as, for example, good spectral 
representation of the audio signal, critical sampling, and 
block overlapping. 

20 

Fig. 11 shows an overview diagram for the inventive pre- 
ferred apparatus for processing time-discrete sampled val- 
ues representing an audio signal so as to obtain integer 
values, relying on which the IntMDCT integer transform al- 

25 gorithm operates. The time-discrete sampled values will 
then be windowed by the apparatus shown in Fig. 11 and op- 
tionally translated into a spectral representation. The 
time-discrete sampled values being fed into the apparatus 
at an input 10 will be windowed with a window w having a 

30 length corresponding to 2 N time-discrete sampled values so 
as to achieve integer windowed sampled values at an output 
12, which are suitable to be translated into a spectral 
representation by means of a transform and especially the 
means for means 14 for carrying out an integer DOT. The in- 

35 teger DOT is implemented to generate N output values from N 
input values, which is in contrast to the MDCT function 408 
from Fig. 10a, which only generates N spectral values from 
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2N windowed sampled values on the basis of the MDCT equa- 
tion . 

For windowing the time-discrete sampled values two time- 
5 discrete sampled values are at first selected in a means 
16, which together represent a vector of time-discrete sam- 
pled values. A time-discrete sampled value, which is se- 
lected by the means 16, is positioned in the first quarter 
of the window. The other time-discrete sampled value is po- 
10 sitioned in the second quarter of the window, as is set 
forth in more detail from Fig. 13. The vector generated by- 
means 16 is now provided with a rotary matrix of the dimen- 
sion 2X2, with the operation not being carried out imme- 
diately but by means of several so-called lifting matrices. 

15 

A lifting matrix has the property that it only comprises 
one element which depends on the window w and is unequal to 
"1" or "0". 

20 The factorization of wavelet transform in lifting steps is 
represented in the technical publication "Factoring Wavelet 
Transforms Into Lifting Steps", Ingrid Daubechies and Wim 
Sweldens, Preprint, Bell Laboratories, Lucent Technologies, 
1996. Generally, a lifting scheme is a simple relation be- 

25 tween perfectly reconstructing filter pairs which comprise 
the same low-pass or high-pass filter. Each pair of comple- 
mentary filters may be factorized in lifting steps. In par- 
ticular, this applies to the Givens rotations. Consider the 
case in which the poly-phase matrix is a Givens rotation. 

30 Then, the following equation is valid: 




Each of the three lifting matrices to the right of the 
35 equalization sign have the value "1" as main diagonal ele- 
ments. Further, in each lifting matrix, a subsidiary diago- 
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nal element equals 0, and a subsidiary diagonal element is 
dependent on the rotary angle a. 

The vector will now be multiplied with the third lifting 
matrix, i.e. the lifting matrix to the very right in the 
above equation so as to obtain a first result vector. This 
is represented by a means 18 in Fig. 11. In accordance with 
the invention, the first result vector will now be rounded 
with any rounding function mapping the amount of the real 
numbers in the amount of the integer numbers, as is repre- 
sented in Fig. 11 by a means 20. At the output of the means 
20 a rounded first result vector is obtained. The rounded 
first result vector is now fed into a means 22 for multi- 
plying the same by the middle, i.e. second, lifting matrix 
so as to obtain a second result vector which is again 
rounded in a means 24 so as to obtain a rounded second re- 
sult vector. The rounded second result vector is now fed 
into a means 26, i.e. for multiplying the same by the lift- 
ing matrix set forth on the left side in the above equa- 
tion, i.e. by the first lifting matrix, so as to obtain a 
third result vector, which is finally rounded once more by 
means of a means 28 so as to finally obtain integer window 
sampled values at the output 12, which now have to be proc- 
essed by the means 14, if a spectral representation of the 
same is desired so as to obtain integer spectral values at 
a spectral output 30 . 

Preferably the means 14 is implemented as an integer DCT or 
integer DCT. 

The discrete cosine transform in accordance with type 4 
(DCT-IV) having a length N is given by the following equa- 
tion : 




(2) 
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The coefficients of the DCT-IV form an orthonormal N x N 
matrix. Each orthogonal N x N matrix may be decomposed in N 
(N-l)/2 Givens rotation, as is set forth in the technical 
publication P. P. Vaidyanathan, "Multirate Systems And Fil- 
5 ter Banks", Prentice Hall, Englewood Cliffs, 1993. It 

should be appreciated that further decompositions also ex- 
ist. 

With respect to the classifications of the various DCT al- 
io gorithms, reference should be made to H. S. Malvar, "Signal 
Processing With Lapped Transforms", Artech House, 1992. 
Generally, the DCT algorithms distinguish themselves by the 
type of their basis function. While the DCT-IV, which is 
preferred in the present invention, includes non-symmetric 
15 basis functions, i.e. a cosine quarter wave, a cosine 3/4 
wave, a cosine 5/4 wave, a cosine 7/4 wave, etc., the dis- 
crete cosine transform, for example, of the type II (DCT- 
II), has axis symmetric and point symmetric basis func- 
tions. The O"' basis function has a direct component, the 
20 first basis function is a half cosine wave, the second ba- 
sis function is a whole cosine wave, and so on. Owing to 
the fact that DCT-II especially considers the direct compo- 
nent, the same is used in video-encoding, but not in audio- 
encoding, since, in audio-encoding, in contrast to video- 
25 encoding, the direct component is not relevant. 

In the following special reference is made to as how the 
rotary angle a of the Givens rotation depends on the win- 
dow function. 

30 

An MDCT with a window length of 2 N may be reduced into a 

discrete cosine transform of type IV with a length N. This 
is achieved by explicitly carrying out the TDAC transform 
in the time domain and then applying the DCT-IV. In a 50% 
35 overlapping the left half of the window for a block t over- 
laps the right half of the preceding block, i.e. the block 
t-1. The overlapping part of two successive blocks t-1 and 
t will be preprocessed in a time domain, i.e. prior to the 
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transform, as follows, i.e. is processed between the input 
10 and the output 12 from Fig. 11: 




(3) 



5 

The values designated with a tilde comprise those values at 
the output 12 from Fig. 1, while the x values designated in 
the above equation without any tilde comprise those values 
at the input 10 and/or behind the means 16 which are to be 
10 selected. The running index k runs from 0 to N/2-1, while w 
represents the window function. 

From the TDAC condition for the window function w, the fol- 
lowing context is valid: 

15 

wi^^ + k^ +wi^^-l-k^ =1 (4) 

For certain angles ttk, k = 0, N/2-1, this preprocessing 
in the time domain may be written as a Givens rotation, as 
20 has been set forth. 

The angle a of the Givens rotation depends on the window 
function w as follows: 

25 a = arctan [w(N/2- 1 -k) / w (N/2 + k)] ( 5 ) 

It should be appreciated that any window functions w may be 
employed as long as this TDAC condition is fulfilled. 

30 In the following a cascaded encoder and decoder are de- 
scribed by means of Fig. 12. The time-discrete sampled val- 
ues x(0) to x (2N-1) , which are windowed together by one 
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window, will be selected such by the means 16 from Fig. 11 
that the sampled value x(0) and the sampled value x(N-l), 
i.e. a sampled value from the first quarter of the window 
and a sampled value from the second quarter of the window, 
5 are selected so as to form the vector at the output of the 
means 16. The intersecting arrows schematically represent 
the lifting multiplications and subsequent roundings of the 
means 18, 20 and/or 22, 24 and/or 26, 28 so as to obtain 
the integer window sampled values at the input of the DCT- 
10 IV blocks. 

When the first vector, as described above, has been proc- 
essed, a second vector is further selected from the sampled 
values x(N/2-l) and x(N/2), i.e. again a sampled value from 

15 the first quarter of the window and a sampled value from 

the second quarter of the window, and processed by the al- 
gorithm described in Fig. 1. As an analogy to this all the 
other sampled value pairs from the first and second quarter 
of the window will be processed. The same processing will 

20 be carried out for the third and fourth quarter of the 

first window. 2N windowed integer sampled values are now 
present at the output 12, which will now be fed, as is rep- 
resented in Fig. 12, into a DCT-IV transform. In particu- 
lar, the integer windowed sampled values of the second and 

25 third quarter will be fed into a DCT. The windowed integer 
sampled values of the first quarter of a window will be 
processed in a preceding DCT-IV together with the windowed 
integer sampled values of the fourth quarter of the preced- 
ing window. As an analogy to this the fourth quarter of the 

30 windowed integer sampled values in Fig. 12 together with 
the first quarter of the next window will be fed together 
in a DCT-IV transform. The middle integer DCT-IV transform 
32 shown in Fig. 12 now provides N integer spectral values 
y(0) to y(N-l) . These integer spectral values may now be 

35 simply subjected to an entropy-encoding without any inter- 
mediate quantizing being required, since the inventive win- 
dowing and transform provide integer output values. 
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A decoder is shown in the right half of Fig. 12. The de- 
coder consisting of retransform and inverse windowing works 
inversely to the encoder. It is known that for inverse 
transform of a DCT-IV an inverse DCT-IV may be used, as is 
5 shown in Fig. 12. The output values of the decoder DCT-IV 
34, as is shown in Fig. 2, will now be inversely processed 
with the corresponding values of the preceding transform 
and/or the subsequent transform in accordance with the pre- 
sent invention so as to generate, from the integer windowed 
10 samples values at the output of the means 34 and/or of the 
preceding and subsequent transform, time-discrete audio 
sampled values x(0) to x(2N-l) . 

The output-side operation inventively takes place by an in- 
15 verse Givens rotation, i. e. such that the blocks 26, 28 

and/or 22, 24 and/or 18, 20 are being passed through in the 
opposite direction. This should be represented in more de- 
tail by means of the second lifting matrix from equation 1 . 
If (in the encoder) the second result vector is formed by 
20 multiplication of the rounded first result vector by the 

second lifting matrix (means 22) , the following expression 
results : 

(x, y) -> (x, y + X sin a) ( 6 ) 

25 

The values x, y on the right side of the equation 6 are in- 
tegers. This, however, does not apply to the value sin a. 
Here, the rounding function r has to be introduced as is 
the case in the following equation: 

30 

(x, y) ^ (x, y + r (x sin a)) ( 7 ) 

Means 24 carries out this operation. 
35 The inverse mapping (in the decoder) is defined as follows: 



(x', y') (x', y' -r (x sin a)) 



(8) 
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From the minus sign in front of the rounding operation it 
is obvious that the integer approximation of the lifting 
step may be reversed without any error being introduced. 
Applying this approximation on each of the three lifting 
5 steps results in an integer approximation of the Givens ro- 
tation. The rounded rotation (in the encoder) may be in- 
verted (in a decoder) , without introducing an error, namely 
by passing through the inverse rounded lifting steps in an 
inverted order, i.e. if the algorithm from Fig. 1 is car- 
lo ried out from the bottom to the top during decoding. 

If the rounding function r is point-symmetric, the inverse 
rounded rotation is identical with the rounded rotation 
with the angle -a and is as follows: 



The lifting matrices for the decoder, i.e. for the inverse 
Givens rotation, immediately result in this case from equa- 
20 tion (1) by merely replacing the expression "sin a" by the 
expression "-sin a". 

In the following the decomposition of a common MDCT with 
overlapping windows 42 to 4 6 is once more shown by means of 

25 Fig. 13. The windows 40 to 46 each have an overlapping of 
50%. Per window, Givens rotations are at first carried out 
within the first and the second quarter of a window and/or 
within the third and fourth quarter of a window, as is 
schematically represented by the arrows 48. Then, the ro- 

30 tating value, i.e. the windowed integer sampled values, 

will be fed into an N-to-N-DCT such that the second and the 
third quarter of a window and/or the fourth and the first 
quarter of a subsequent window will always be implemented 
together by means of a DCT-IV algorithm in a spectral rep- 

35 resentation. 



15 




(9) 
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In accordance with the invention the usual Givens rotations 
are decomposed in lifting matrices, which are sequentially 
carried out, wherein, after each lifting matrix multiplica- 
tion, a rounding step is carried out such that the float- 
5 ing-point numbers will be rounded immediately after their 
arising such that, prior to each multiplication of a result 
vector with a lifting matrix, the result vector only com- 
prises integers. 

10 Thus, the output values always remain integer, wherein it 
is preferred to use integer input values. This does not 
represent any constriction, since any PCM sampled values, 
as are stored in a CD, are integer number values, the value 
area of which varies depending on the bit-width, i.e. de- 

15 pending on whether the time-discrete digital input values 
are 16 bit values or 24 bit values. Yet, as has been set 
forth, the whole process is invertible by carrying out the 
inverse rotations in an inverse order. In accordance with 
the invention, an integer approximation of the MDCT exists 

20 for the perfect reconstruction, that is a lossless trans- 
form. 

The inventive transform provides integer output values in- 
stead of floating point values. It provides a perfect re- 

25 construction such that no errors will be introduced if a 
forward and then a backward transform are carried out. In 
accordance with a preferred embodiment of the present in- 
vention the transform is a replacement for the modified 
discrete cosine transform. Other transform methods may also 

30 be carried out on an integer basis as long as a decomposi- 
tion in rotations and a decomposition of the rotations in 
lifting steps is possible. 

The integer MDCT in accordance with the present invention 
35 provides the most favorable properties of the MDCT. It has 
an overlapping structure, as a result of which a better 
frequency selectivity than with non-overlapping block 
transforms may be obtained. On the basis of the TDAC func- 
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tion which has already been considered when windowing prior 
to the transform, a critical sampling is maintained such 
that the total number of spectral values representing an 
audio signal equals the total number of input sampled val- 
5 ues . 

Compared to another normal MDCT providing the floating 
point sampled values the inventive integer transform dis- 
closes that, as compared to the normal MDCT, the noise is 

10 increased only in the spectral area, where there is little 
signal level, while this noise increase may not be noticed 
in significant signal levels. For this purpose, the inven- 
tive integer processing is suitable for an efficient hard- 
ware implementation, since only multiplication steps are 

15 used which may easily be decomposed into shift/add steps, 
which may be easily and quickly implemented on a hardware 
basis . 

The inventive integer transform provides a good spectral 

20 representation of the audio signal and yet remains in the 
area of the integer numbers. If applied to tonal parts of 
an audio signal, this results in a good energy concentra- 
tion. Thus, an efficient lossless encoding scheme may be 
built up by simply cascading the inventive window- 

25 ing/transf orm represented in Fig. 1 with an entropy- 
encoder. Especially, a stacked encoding using escape val- 
ues, as it is used in MPEG AAC, is favorable for the pre- 
sent invention. It is preferred to scale down all values by 
a certain power until they fit in a desired code table, and 

30 then to additionally encode the left out least significant 
bits. As compared to the alternative of the using greater 
code tables, the described alternative is more inexpensive 
with respect to the storage consumption for storing the 
code table. A nearly lossless encoder might also be ob- 

35 tained by simply leaving out certain ones of the least sig- 
nificant bits. 



REPLACEMENT PAGE 



In particular for tonal signals an entropy-encoding of the 
integer spectral values enables a high encoding gain. For 
transient parts of the signal, the encoding gain is low, 
namely on the basis of the flat spectrum of the transient 
5 signal, i.e. on the basis of a low number of spectral val- 
ues, which are equal to or almost 0. As is described in J. 
Herre, J. D. Johnston: "Enhancing the Performance of Per- 
ceptual Audio Coders by Using Temporal Noise Shaping (TNS) " 
101, AES Convention, Los Angeles, 1996, Preprint 4384, this 

10 flatness, however, may be used by using a linear prediction 
in the frequency domain. An alternative is a prediction 
with an open loop. Another alternative is the predictor 
with a closed loop. The first alternative, i.e. the predic- 
tor with an open loop, is referred to as a TNS. The quan- 

15 tizing of the prediction results in an adaptation of the 

resulting quantizing noise to the time structure of the au- 
dio signal and prevents pre-echos in psycho-acoustic audio- 
encoding. For a lossless audio-encoding, the second alter- 
native, i.e. with a predictor with a closed loop, is more 

20 suitable, since the prediction with a closed loop allows an 
accurate reconstruction of the input signal. If this tech- 
nology is applied to an inventively generated spectrum, a 
rounding step has to be carried out after each step of the 
prediction filter so as to remain in the range of the inte- 

25 gers . By using the inverse filter and the same rounding 
function, the original spectrum may be accurately repro- 
duced. 

In order to utilize the redundancy between two channels for 
30 data reduction, a middle-side encoding may be employed on a 
lossless basis, if a rounded rotation having an angle 7i/4 

is used. As compared to the alternative of calculating the 
sum and difference of the left and right channel of a ste- 
reo signal, the rounded rotation provides the advantage of 
35 energy conservation. Using so-called joint-stereo encoding 
techniques may be turned on or off for each band, as is 
carried out in the standard MPEG AAC. Further rotary angles 
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may also be considered so as to be able to reduce a redun- 
dancy between two channels in more flexible manner. 

Depending on practical circumstances, the inventive encoder 
5 concept and/or the inventive decoder concept may be imple- 
mented in a hardware or in a software. The implementation 
will be effected on a digital storage medium, in particu- 
lar, on a floppy disk or a CD with electronically readable 
control signals, which may cooperate with a programmable 

10 computer system so that the corresponding method is carried 
out. Generally the invention consists also in a computer 
program product having a program code stored on a machine- 
readable carrier for carrying out the inventive encoding 
method or the inventive decoding method, if the computer 

15 program product executes on a computer. In other words, the 
invention thus represents a computer program with a program 
code for carrying out the method for decoding and/or for 
carrying out the method for encoding, if the computer pro- 
gram executes on a computer. 

20 
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What is ciaimed is:^i 



An apparatus for scalable encoding of a spectrum of a 
signal including audio and/or video information, with 
the spectrum comprising binary spectral values, the 
apparatus comprising: 

Meano a generator (102) — for generating a first sub- 
scaling layer using bits of a certain order of a first 
number of the binary spectral values in a band, with 
the first number being greater or equal to 4 -„ 1 " and 
less than a total number of the binary spectral values 
in the band, and for generating a second sub-scaling 
layer using bits of the certain order of a second num- 
ber of the binary spectral values, with the generator 
ivi^a - BS - j= - e^-vfe^e:]:-at - it^g - - being implemented so as to select 
the second number of the binary spectral values, such 
that the number is greater than or equal to -l -,, 1 and 
less than the total number of the binary spectral val- 
ues in the band, and to further determine the second 
number of the spectral values, such that the number 
comprises at least one binary spectral value which is 
not contained in the first number of binary spectral 
values; and 

Mea - i » e 'a former -Hrw-S-) — for forming an encoded signal, 
with the meaB"3 — — jj - e^^n^ - a ^ f ormer being implemented so 
as to include the first sub-scaling layer and the sec- 
ond sub-scaling layer into the encoded signal such, 
that the first and the second sub-scaling layer (113a, 
113b) are separately decodable from each other. 

The apparatus in accordance with claim 1, further com- 
prising : 

a full-scaling layer gen e r at o rmea n a for generating a 
full-scaling layer using all bits with an order, which 
is different from the certain order, in the band, and 
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with the f:iec: - t^a — — form-k^er -fi-G-S-) — being further im- 
plemented, so as to include the full-scaling layer in 
the bit stream, such that it is independently decode- 
able from the first and the second sub-scaling layer 
(113q, 113b) . 

The apparatus in accordance with claim 1, wherein the 
binary spectral values are quantized, with the appara- 
tus further comprising: 

a calculator a^e?i - s — ("8 - 4 -) " for calculating orders of most 
significant bits of a psycho-acoustic masking thresh- 
old for the bands; and 

a de finer n ^ ea - ns — f-ter9-4~) — for defining scaling layers of 
the bits of the binary spectral values, with a scaling 
layer comprising bits of the binary spectral values, 
the orders of which are in a certain difference to the 
orders of the most significant bits of the psycho- 
acoustic masking threshold for the bands or the orders 
of which are equal to the orders of the most signifi- 
cant bits of the psycho-acoustic masking threshold for 
these bands. 

The apparatus in accordance with claim 3, 

wherein the mcanc — ( ]. 0 2 ) — — generating generator the 

:£-i:£'e-iv --eri-d k~'iie eeeeri-d 3*ife—sea4:i^^a i-aye^^s {■±:i-3-a-7 l-l-3h}- 

j &e4 - F !- g" - is implemented so as to use as bits of a certain 
order the bits of the binary spectral values, the dif- 
ference of which to the order of the most significant 
bits of the psycho-acoustic masking threshold in the 
band is equal to "+1", "0" and/or "-1". 

The apparatus in accordance with claim 3, 
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wherein the ;ftea-a«- (•S4-) §<■)■¥} erJ.--iew-±-a-lvi-ft<-^- ca 1 c u 1 at o r-the' 

e£ = 4e - £ - s — — t4=ie — :ftea - fe — aji - gf i-i tr - jTiirea r f^fe — b - ^ - fe - s — — fcfee — ■ ^ ;. 6 - ye - ; = na— 

eee^^e4^4:e----!:fiae-k-3^-gi<g^"--fej^»gee-he-ld; 5e4;Trg i s implemented so as 

to determine for each spectral value in the band an 
order of a most significant bit or to determine an or- 
der of a most significant bit of the psycho-acoustic 
masking threshold for the entire band. 

The apparatus in accordance claim 3, wherein the ^a - F> - ?j - 
4^^-^^ — ^e*= — jj - e^^f;± - i = t^ - - f ormer is further implemented so as 
to include information on the psycho-acoustic masking 
threshold as side information ■■{■ i - l - Q -i — into the encoded 
signal . 

The apparatus in accordance claim 1, 

wherein the first sub-scaling layer is decodeable 
prior to the second sub-scaling layer, and 

wherein the > > ^^^i >t i m j j t ^ 

4:-*:-&fe ;;i-ti4--6-eee*?-e s-afe— e-e;5i-i-i-:-i<[3--"i;c--y e-2- i s implemented so 

as to select for the first number of the binary spec- 
tral values the spectral value (s), by which a maximum 
precision gain for the band may be achieved. 

The apparatus in accordance with claim 1, 

wherein the first sub-scaling layer is decodeable 
prior to the second sub-scaling layer, and 

wherein the t?>ea - a! - 8 — B-O-Si — #^«? — g e fi e - g - ati ng — generator '^:h:<^ 
.:^.i.:f;.^^,....f^p^f^...:^Mi^^....^^ s Implemented 

so as to use for the first sub-scaling layer the bi- 
nary spectral value which, represented by the bits in 
higher scaling layers, comprises the greatest differ- 
ence to a psycho-acoustic masking threshold for the 
spectral value in the band. 
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The apparatus in accordance with claim 1, 



wherein the 



&i^'f:-&^:""e:ftdr---^<^----&e<S:&i^<A---f3-t^ s implemented 

5 so as to use for the first sub-scaling layer the bi- 

nary spectral value which, represented by the bits in 
higher scaling layers, is the smallest quantized spec- 
tral value in the band. 

10 10. The apparatus in accordance with claim 1, 

wherein the spectral values have been generated by an 
integer MDCT from time-sampled values of the signal. 

15 11. The apparatus in accordance with claim 1, wherein the 
spectral values have been quantized using a psycho- 
acoustic and/or psycho-optical model — 

12. The apparatus in accordance with claim 11, 

20 

wherein the f?tea:i-i-s- (4-&2-) €■&£■ eefieTti^t-ifi-g -generator e 

i f?4 - ge ^ S ---- 5 - rt4 --- e ---- eeee^j^ ---- 6^te ^" e - ee - ;i ^ implemented so 

as to use a constant certain order of bits in the 
bands . 

25 

13. The apparatus in accordance with claim 11, 

wherein the certain order includes the least signifi- 
cant order of the bits of the quantized binary spec- 
30 tral values. 

14. The apparatus in accordance with claim 1, 
wherein a band comprises m spectral values, 

35 

with m is being greater than or equal to and 
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wherein the ■metwj-e fl-e-2") •ii-e-i:-""-«?-«-fte-£-a-t-,--t-r;-< r.,: 

— a-E=td — fcfee — &eee - ! = td — attb — eeaj: - ± : f^^ — i™ye*^is implemented 
so as to calculate the first and second number of sub- 
scaling layers, such that they are at a maximum equal 
to m and at a minimum equal to 4t„ 1" , wherein, in the 
case, in which m sub-scaling layers are present, each 
sub-scaling layer includes a bit of the certain order 
of exactly one spectral value, with one spectral value 
being present only in exactly one sub-scaling layer 
for the certain order. 

The apparatus in accordance with claim 14, wherein m 
is equal to 

The apparatus in accordance with claim 1, 

wherein the mea^e — fi-94H — #-©^f — '•je - ^e - .t-a - t-i - a^'j^ — generator ■ fe^ 

■ii-i:£^e-b--a-i^4--t^i:e s-eee-f ^4--3«-fe—seai-i:f^>-a la-ye*-- i s implemented 

so as to carry out an arithmetical encoding of the 
first and/or second number of bits of the quantized 
spectral values of the certain order. 

An apparatus for scalable decoding an encoded signal 
comprising a first and a second sub-scaling layer, 
with the first sub-scaling layer comprising bits of a 
certain order of a first number of binary spectral 
value in a band, with the second sub-scaling layer 
comprising bits of the certain order of a second num- 
ber of binary spectral values in the band, and with 
the second number comprising at least one spectral 
value not contained in the first number, the apparatus 
comprising: 

ff> e a:n - 5 ----;an____extra — for extracting the first sub- 

scaling layer from the encoded signal and the second 
sub-scaling layer from the encoded signal; and 
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Pt«ri-h-'-- 0 ) for processing the first sub-scaling 

layer and the second sub-scaling layer so as to deter- 
mine the bits of the certain order of the binary quan- 
tized spectral values in the band. 

The apparatus in accordance with claim 17, 

wherein the first number of the binary spectral values 
for the first sub-scaling layer is selected so as to 
achieve a maximum precision gain for a band, 

wherein the f iiea j» e- -ex t r a c t o r-- (^ - G4j — jj^^-e^^ i ^ - g i ee - fevdr j^ a is im- 
plemented so as to extract the first sub-scaling layer 
prior to the second sub-scaling layer. 

A method for scalable encoding a spectrum of a signal 
including audio and/or video information, with the 
spectrum comprising binary spectral values, the method 
comprising—tfre — fre - j: - iiiew4 - n <g- -& : b - ep - & : 

generating -i-l-Q-S:-)- a first sub-scaling layer using bits 

of a certain order of the first number of binary spec- 
tral values in a band, with the first number being 
greater than or equal to -it^JJ^ and less than a total 
number of the binary spectral values in the band, and 
generating a second sub-scaling layer using bits 
of the certain order of a second number of binary 
spectral values, wherein the moanc — (102) — for gonorat- 

■i-fK^ ia i-fti^jiejaeFiive-d s-e rie iv<: > the step of generating 

comprising select ing a second number of the binary 
spectral values, such that the number is greater than 
or equal to -1-„1'^ and less than the total number of the 
binary spectral values in the band, and -fee — determin- 
inge the second number of the spectral values further 
such, that the number comprises at least one binary 
spectral value, which is not contained in the first 
number of binary spectral values; and 
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forming ■{•■+-0-&-) an encoded signal-? th-e i^e&B-s 

- jr6:£ ^ t -i :3::F: - g --- fee - if^g --- 4i ¥ti:i 4:eK t e - B^:ed --- e - 6 --^ — &e the Step o£ fonning 
includinge the first sub-scaling layer and the second 
sub-scaling layer into the encoded signal such, that 
the first and the second sub-scaling layers ici - h^a - r 
ir^ir^h^ — are separately decodeable from each other. 

A method for scalable decoding an encoded signal com- 
prising a first and a second sub-scaling layer, with 
the first sub-scaling layer comprising bits of a cer- 
tain order of a first number of binary spectral values 
in a band, with the second sub-scaling layer compris- 
ing bits of the certain order of a second number of 
binary spectral values in the band, and wherein the 
second number comprising at least one spectral value 
not contained in the first number, the method compris- 
ing - - fehe - -^^8 - lj: - evf4fvg - -efeeg>& : 

extracting -ir^^f-QAA — the first sub-scaling layer from the 
encoded signal and the second sub-scaling layer from 
the encoded signal; and 

processing the first sub-scaling layer and the second 
sub-scaling layer so as to determine the bits of the 
certain order of the binary quantized spectral values 
in the band. 

Computer program having a program code for carrying 
out_j_ wnen the program executes on a compuv-er, " j,-he a 



method 
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1 order of the binary quantized spectral values 
band . 
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Method and Apparatus for Scalable Encoding and Method and 
Apparatus for Scalable Decoding 

Abstract 

5 

An apparatus for scalable encoding a spectrum of a signal 
including audio and/or video information, with the spectrum 
comprising binary spectral values, includes a means - fi - € - 2 -- ) -- 
for generating a first sub-scaling layer and a second sub- 

10 scaling layer in addition to a means — for forming the 

encoded signal, with the means -ftO-S-) — for forming being im- 
plemented so as to include the first sub-scaling layer and 
the second sub-scaling layer into the encoded signal that 
the first and the second sub-scaling layer are separately 

15 decodable from each other. In contrast to a full-scaling 
layer, a sub-scaling layer includes only the bits of a cer- 
tain order of a part of the binary spectral values in the 
band, so that, by additionally decoding a sub-scaling 
layer, a more finely controllable and a more finely scal- 

20 able precision gain may be achieved. 



